智能论文笔记

Paying Attention to Astronomical Transients: Introducing the Time-series Transformer for Photometric Classification

Tarek Allam Jr. , Jason D. McEwen

分类：机器学习

2021-05-13

Future surveys such as the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory will observe an order of magnitude more astrophysical transient events than any previous survey before. With this deluge of photometric data, it will be impossible for all such events to be classified by humans alone. Recent efforts have sought to leverage machine learning methods to tackle the challenge of astronomical transient classification, with ever improving success. Transformers are a recently developed deep learning architecture, first proposed for natural language processing, that have shown a great deal of recent success. In this work we develop a new transformer architecture, which uses multi-head self attention at its core, for general multi-variate time-series data. Furthermore, the proposed time-series transformer architecture supports the inclusion of an arbitrary number of additional features, while also offering interpretability. We apply the time-series transformer to the task of photometric classification, minimising the reliance of expert domain knowledge for feature selection, while achieving results comparable to state-of-the-art photometric classification methods. We achieve a logarithmic-loss of 0.507 on imbalanced data in a representative setting using data from the Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC). Moreover, we achieve a micro-averaged receiver operating characteristic area under curve of 0.98 and micro-averaged precision-recall area under curve of 0.87.

translated by 谷歌翻译

Face Generation and Editing with StyleGAN: A Survey

Andrew Melnik , Maksim Miasayedzenkau , Dzianis Makarovets , Dzianis Pirshtuk , Eren Akbulut , Dennis Holzmann , Tarek Renusch , Gustav Reichert , Helge Ritter

分类：计算机视觉 | 机器学习

2022-12-18

Our goal with this survey is to provide an overview of the state of the art deep learning technologies for face generation and editing. We will cover popular latest architectures and discuss key ideas that make them work, such as inversion, latent representation, loss functions, training procedures, editing methods, and cross domain style transfer. We particularly focus on GAN-based architectures that have culminated in the StyleGAN approaches, which allow generation of high-quality face images and offer rich interfaces for controllable semantics editing and preserving photo quality. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.

translated by 谷歌翻译

Time series numerical association rule mining variants in smart agriculture

Iztok Fister Jr. , Dušan Fister , Iztok Fister , Vili Podgorelec , Sancho Salcedo-Sanz

分类：神经与进化计算

2022-12-07

Numerical association rule mining offers a very efficient way of mining association rules, where algorithms can operate directly with categorical and numerical attributes. These methods are suitable for mining different transaction databases, where data are entered sequentially. However, little attention has been paid to the time series numerical association rule mining, which offers a new technique for extracting association rules from time series data. This paper presents a new algorithmic method for time series numerical association rule mining and its application in smart agriculture. We offer a concept of a hardware environment for monitoring plant parameters and a novel data mining method with practical experiments. The practical experiments showed the method's potential and opened the door for further extension.

translated by 谷歌翻译

Impact of Automatic Image Classification and Blind Deconvolution in Improving Text Detection Performance of the CRAFT Algorithm

Clarisa V. Albarillo , Proceso L. Fernandez Jr

分类：计算机视觉 | 机器学习

2022-11-29

Text detection in natural scenes has been a significant and active research subject in computer vision and document analysis because of its wide range of applications as evidenced by the emergence of the Robust Reading Competition. One of the algorithms which has good text detection performance in the said competition is the Character Region Awareness for Text Detection (CRAFT). Employing the ICDAR 2013 dataset, this study investigates the impact of automatic image classification and blind deconvolution as image pre-processing steps to further enhance the text detection performance of CRAFT. The proposed technique automatically classifies the scene images into two categories, blurry and non-blurry, by utilizing of a Laplacian operator with 100 as threshold. Prior to applying the CRAFT algorithm, images that are categorized as blurry are further pre-processed using blind deconvolution to reduce the blur. The results revealed that the proposed method significantly enhanced the detection performance of CRAFT, as demonstrated by its IoU h-mean of 94.47% compared to the original 91.42% h-mean of CRAFT and this even outperformed the top-ranked SenseTime, whose h-mean is 93.62%.

translated by 谷歌翻译

GET-DIPP: Graph-Embedded Transformer for Differentiable Integrated Prediction and Planning

Jiawei Sun , Chengran Yuan , Shuo Sun , Zhiyang Liu , Terence Goh , Anthony Wong , Keng Peng Tee , Marcelo H. Ang Jr

分类：机器人

2022-11-11

Accurately predicting interactive road agents' future trajectories and planning a socially compliant and human-like trajectory accordingly are important for autonomous vehicles. In this paper, we propose a planning-centric prediction neural network, which takes surrounding agents' historical states and map context information as input, and outputs the joint multi-modal prediction trajectories for surrounding agents, as well as a sequence of control commands for the ego vehicle by imitation learning. An agent-agent interaction module along the time axis is proposed in our network architecture to better comprehend the relationship among all the other intelligent agents on the road. To incorporate the map's topological information, a Dynamic Graph Convolutional Neural Network (DGCNN) is employed to process the road network topology. Besides, the whole architecture can serve as a backbone for the Differentiable Integrated motion Prediction with Planning (DIPP) method by providing accurate prediction results and initial planning commands. Experiments are conducted on real-world datasets to demonstrate the improvements made by our proposed method in both planning and prediction accuracy compared to the previous state-of-the-art methods.

translated by 谷歌翻译

Probabilistic thermal stability prediction through sparsity promoting transformer representation

Yevgen Zainchkovskyy , Jesper Ferkinghoff-Borg , Anja Bennett , Thomas Egebjerg , Nikolai Lorenzen , Per Jr. Greisen , Søren Hauberg , Carsten Stahlhut

分类： (统计)机器学习 | 机器学习

2022-11-10

Pre-trained protein language models have demonstrated significant applicability in different protein engineering task. A general usage of these pre-trained transformer models latent representation is to use a mean pool across residue positions to reduce the feature dimensions to further downstream tasks such as predicting bio-physics properties or other functional behaviours. In this paper we provide a two-fold contribution to machine learning (ML) driven drug design. Firstly, we demonstrate the power of sparsity by promoting penalization of pre-trained transformer models to secure more robust and accurate melting temperature (Tm) prediction of single-chain variable fragments with a mean absolute error of 0.23C. Secondly, we demonstrate the power of framing our prediction problem in a probabilistic framework. Specifically, we advocate for the need of adopting probabilistic frameworks especially in the context of ML driven drug design.

translated by 谷歌翻译

Federated Learning Using Three-Operator ADMM

Shashi Kant , José Mairton B. da Silva Jr. , Gabor Fodor , Bo Göransson , Mats Bengtsson , Carlo Fischione

分类：机器学习

2022-11-08

Federated learning (FL) has emerged as an instance of distributed machine learning paradigm that avoids the transmission of data generated on the users' side. Although data are not transmitted, edge devices have to deal with limited communication bandwidths, data heterogeneity, and straggler effects due to the limited computational resources of users' devices. A prominent approach to overcome such difficulties is FedADMM, which is based on the classical two-operator consensus alternating direction method of multipliers (ADMM). The common assumption of FL algorithms, including FedADMM, is that they learn a global model using data only on the users' side and not on the edge server. However, in edge learning, the server is expected to be near the base station and have direct access to rich datasets. In this paper, we argue that leveraging the rich data on the edge server is much more beneficial than utilizing only user datasets. Specifically, we show that the mere application of FL with an additional virtual user node representing the data on the edge server is inefficient. We propose FedTOP-ADMM, which generalizes FedADMM and is based on a three-operator ADMM-type technique that exploits a smooth cost function on the edge server to learn a global model parallel to the edge devices. Our numerical experiments indicate that FedTOP-ADMM has substantial gain up to 33\% in communication efficiency to reach a desired test accuracy with respect to FedADMM, including a virtual user on the edge server.

translated by 谷歌翻译

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Yu Meng , Martin Michalski , Jiaxin Huang , Yu Zhang , Tarek Abdelzaher , Jiawei Han

分类：自然语言处理 | 机器学习

2022-11-06

Recent studies have revealed the intriguing few-shot learning ability of pretrained language models (PLMs): They can quickly adapt to a new task when fine-tuned on a small amount of labeled data formulated as prompts, without requiring abundant task-specific annotations. Despite their promising performance, most existing few-shot approaches that only learn from the small training set still underperform fully supervised training by nontrivial margins. In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective. A classification PLM can then be fine-tuned on both the few-shot and the synthetic samples with regularization for better generalization and stability. Our approach FewGen achieves an overall better result across seven classification tasks of the GLUE benchmark than existing few-shot learning methods, improving no-augmentation methods by 5+ average points, and outperforming augmentation methods by 3+ average points.

translated by 谷歌翻译

Phy-Taylor: Physics-Model-Based Deep Neural Networks

Yanbing Mao , Lui Sha , Huajie Shao , Yuliang Gu , Qixin Wang , Tarek Abdelzaher

分类：机器学习 | 人工智能

2022-09-27

应用于物理工程系统的纯粹数据驱动的深神经网络（DNN）可以推断出违反物理定律的关系，从而导致意外后果。为了应对这一挑战，我们提出了一个基于物理模型的DNN框架，即Phy-Taylor，该框架以物理知识加速了学习合规的表示。 Phy-Taylor框架做出了两个关键的贡献。它引入了一个新的建筑物理兼容神经网络（PHN），并具有新颖的合规机制，我们称{\ em物理学引导的神经网络编辑\/}。 PHN的目的是直接捕获受物质量的启发的非线性，例如动能，势能，电力和空气动力阻力。为此，PHN增强了具有两个关键组成部分的神经网络层：（i）泰勒级数序列扩展的非线性功能捕获物理知识的扩展，以及（ii）缓解噪声影响的抑制器。神经网络编辑机制进一步修改了网络链接和激活功能与物理知识一致。作为扩展，我们还提出了一个自我校正的Phy-Taylor框架，该框架介绍了两个其他功能：（i）基于物理模型的安全关系学习，以及（ii）在违反安全性的情况下自动输出校正。通过实验，我们表明（通过直接表达难以学习的非线性并通过限制依赖性）Phy-Taylor的特征较少的参数和明显加速的训练过程，同时提供增强的模型稳健性和准确性。

translated by 谷歌翻译

Placing Human Animations into 3D Scenes by Learning Interaction- and Geometry-Driven Keyframes

James F. Mullen Jr , Divya Kothandaraman , Aniket Bera , Dinesh Manocha

分类：计算机视觉

2022-09-13

我们提出了一种新颖的方法，可以将3D人类动画放入3D场景中，同时保持动画中的任何人类场景相互作用。我们使用计算动画中最重要的网格的概念，以与场景进行交互，我们称之为“键框”。这些关键框架使我们能够更好地优化动画在场景中的位置，从而使动画中的互动（站立，铺设，坐着等）与场景的负担相匹配（例如，站在地板上或躺在床上）。我们将我们称为PAAK的方法与先前的方法进行了比较，包括POSA，Prox地面真理和运动合成方法，并通过感知研究突出了我们方法的好处。人类评估者更喜欢我们的PAAK方法，而不是Prox地面真相数据64.6 \％。此外，在直接比较中，与POSA相比，评估者比竞争方法比包括61.5％的竞争方法更喜欢PAAK。

translated by 谷歌翻译